فهرست مطالب

Journal of Computer and Knowledge Engineering
Volume:6 Issue: 1, Winter-Spring 2023

  • تاریخ انتشار: 1402/01/12
  • تعداد عناوین: 8
|
  • Behzad Soleimani Neysiani, Seyed Morteza Babamir * Pages 1-14
    Duplicate Bug Report Detection (DBRD) is one of the famous problems in software triage systems like Bugzilla. There are two main approaches to this problem, including information retrieval and machine learning. The second one is more effective for validation performance. Duplicate detection needs feature extraction, which is a time-consuming process. Both approaches suffer runtime issues, because they should check the new bug report to all bug reports in the repository, and it takes a long time for feature extraction and duplicate detection. This study proposes a new two-step classification approach which tries to reduce the search space of the bug repository search space in the first step and then check the duplicate detection using textual features. The Mozilla and Eclipse datasets are used for experimental evaluation. The results show that overall, 87.70% and 89.01% validation performance achieved averagely for accuracy and F1-measure, respectively. Moreover, 95.85% and 87.65% of bug reports can be classified in step one very fast for Eclipse and Mozilla datasets, respectively, and the other one needs textual feature extraction until it can be checked by the traditional DBRD approach. An average of 90% runtime improvement is achieved using the proposed method.
    Keywords: Duplicate Detection, Bug Report, Machine learning, Runtime Performance, Search Space Reduction
  • Zahra Mir *, Mohammad Allahbakhsh, Ali Maghsoudi, Haleh Amintoosi Pages 15-26
    Motifs have critical impacts on the behavioral and structural characteristics of RNA sequences. Understanding and predicting the functionalities and interactions of an RNA sequence requires discovering and identifying its motifs. Due to the importance of motif discovery in bioinformatics, a significant corpus of techniques and algorithms have been proposed, each of which has various advantages and limitations and hence, are suitable for specific applications. To understand these techniques and algorithms, compare them, and choose the most suitable one for a particular application scenario, it is crucial to have a clear understanding of the different vital aspects that characterize these algorithms. The lack of such a framework to study these aspects is a serious existing challenge in the literature that needs further investigation. In this paper, we propose a taxonomy and a framework to address this issue. We define the concept of motif discovery process and three aspects that characterize such a process, which are motif type, discovery technique, and application. We then study the literature and classify the existing approaches along with these aspects. This will give the reader a broader view and more precise understanding of what these techniques and algorithms do, how they do it, and what is the most suitable application for each of them. We then present the possible gaps and challenges foreseen to be the future directions of the area.
    Keywords: Algorithm, Bioinformatics, Motif Discovery, RNA Motif, taxonomy
  • Hosein Salami, Mostafa Nouri Baygi * Pages 27-35
    In recent years, several algorithms with different time complexities have been proposed for the construction of greedy spanners. However, a not so apparently suitable algorithm with running time complexity , namely the FG algorithm, is proved to be practically the fastest algorithm known for this task. One of the common bottlenecks in the greedy spanner construction algorithms is their use of the shortest path search operation (usually using Dijkstra’s algorithm). In this paper, we propose some improvements to the FG algorithm in order to reduce the imposed costs of the shortest path search operation, and therefore to reduce the time required for the construction of greedy spanners. In the first improvement, we reduce the number of this operation calls and in the second one, we reduce the cost of each run of the operation. Experimental results show these improvements are able to significantly accelerate the construction of greedy spanners, compared to the other existing algorithms, especially when the stretch factor gets close to 1.
    Keywords: computational geometry, greedy spanner, construction algorithm
  • Hourie Mehrabiun, Behnaz Omoomi * Pages 37-45

    Nowadays, with the development of social networks, the risk of disclosure of users’ information has also increased, which has caused serious concerns among users. Accordingly, privacy preserving on social networks is a significant issue that has attracted much attention. Although there are various methods for preserving privacy on social networks, most of the existing methods are based on the universal approach that considers the same level of preservation for all users and only some of them consider individual personalized privacy requirements, which is very limited, and those are based on users’ willing to share friends list and sensitive information with other users.  This study focuses on a new scheme of personalized privacy preserving based on k-anonymity which can anonymize the social network graph based on the personalized privacy requirements of each individual. We develop a Modified Degree Privacy Level Sequence (MDPLS) Algorithm and execute experiments on two datasets. The results of the experiments show that in this new method of social network graph anonymization, when we consider the personalized privacy requirements, the costs of the anonymity process are reduced and data utility is improved in comparison with the situation where we only consider one level of privacy for all users, i.e., universal approach.

    Keywords: Anonymous Social Network Graph, Personalized Privacy, Privacy Preserving, Social network
  • Alireza Soleimany, Yousef Farhang *, Amin Babazadeh Sangar Pages 47-58
    Due to such disadvantages of current traffic light control methods as waste of time, waste of fuel and resources, increased air pollution, providing an intelligent traffic light control system that leads to the shortest waiting time for vehicles and pedestrians becomes so significant. Given the high priority of this issue, this paper presents an intelligent urban traffic system method based on IoT data and fog processing. Fog processing is a platform that is at the edge of the network and provides powerful services and applications for users. Compared to cloud computing, cloud computing is closer to users and therefore collects information faster and disseminates it over a network of sensors. It also helps cloud computing to perform tasks such as preprocessing and data collection. Cloud computing is a new type of distributed processing structure used for the Internet of Things. This paper proposed a method called GW-KNN. According to this method, we first collect data through the Internet of Things. Then, the preprocessing operation and extraction of effective fields in the cloud processing section are performed using the k-nearest neighbor improved machine learning algorithm. Traffic on each road is predicted in the next time slot and this information is sent for use in the fog processing layer to make traffic control decisions. The concept of Euclidean distance network with Gaussian weight was used to predict the future traffic situation and KNN model was included in the algorithm output to increase the forecasting accuracy and finally solve the problem of traffic light control. This idea was implemented and simulated using MATLAB. To get the results, the implementation was done on a computer with an i7-10750 processor and 16 GB of main memory and 1 TB of external memory. The results of the evaluations show that the proposed method has a much better performance than the previous two methods in terms of absolute mean error percentage, absolute mean error percentage of traffic forecast, and average waiting time of each vehicle.
  • Najva Hafizi, Mojtaba Mazoochi *, Ali Moeini, Leila Rabiei, Seyed Mohammadreza Ghaffariannia, Farzaneh Rahmani Pages 59-70
    Online social networks (OSNs) such as Facebook, Twitter, Instagram, etc. have attracted many users all around the world. Based on the centrality concept, many methods are proposed in order to find influential users in an online social network. However, the performance of these methods is not always acceptable. In this paper, we proposed a new improvement on centrality measures called P-centrality measure in which the effects of node predecessors are considered. In an extended measure called EP-centrality, the effect of the preceding predecessors of node predecessors are also considered. We also defined a combination of two centrality measures called NodePower (NP) to improve the effectiveness of the proposed metrics. The performance of utilizing our proposed centrality metrics in comparison with the conventional centrality measures is evaluated by Susceptible-Infected-Recovered (SIR) model. The results show that the proposed metrics display better performance finding influential users than normal ones due to Kendall’s τ coefficient metric.
    Keywords: online social networks, Centrality measures, Influential users, Susceptible-Infected-Recovered model
  • Fatemeh Khojasteh, Behshid Behkamal *, Mohsen Kahani, Mahsa Khorasani Pages 71-79
    Business processes are subject to changes during their execution over time due to new legislation, seasonal effects, and so on. Detection of process changes is alternatively called business process drift detection. Currently, existing methods unfavorably subject the accuracy of drift detection to the effect of window size. Furthermore, most methods have to struggle with the problem of how to select appropriate features specifying the relations between traces or events. This paper draws on the notion of trace embedding to propose a new framework (Trace2Vec CDD) for automatic detection of suddenly occurring process drifts. The main contributions of the proposed approach are: (i) It is independent of windows. (ii) Trace embedding, which is used for drift detection, makes it possible to automatically extract all features from relations between traces. (iii) As attested by synthetic event logs, our approach is superior to current methods in respect of accuracy and drift detection delay.
    Keywords: Process mining, Concept drift, Process changes, Word embedding
  • Mohammad Ansari Shiri, Najme Mansouri * Pages 81-100
    Recent advances in science, engineering, and technology have created massive datasets. As a result, machine learning and data mining techniques cannot perform well on these huge datasets because they contain redundant, noisy, and irrelevant features. The purpose of feature selection is to reduce the dimensionality of datasets by selecting the most relevant attributes while simultaneously increasing classification accuracy. The application of meta-heuristic optimization techniques has become increasingly popular for feature selection in recent years due to their ability to overcome the limitations of traditional optimization methods. This paper presents a binary version of the Manta Ray Foraging Optimizer (MRFO), an alternative optimization algorithm. Besides reducing costs and reducing calculation time, we also incorporated Spearman's correlation coefficient into the proposed method, which we called Correlation Based Binary Manta Ray Foraging (CBBMRF). It eliminates highly positive correlation features at the beginning of the calculation, avoiding additional calculations and leading to faster subset selection. A comparison is made between the presented algorithms and five state-of-the-art meta-heuristics using 10 standard UCI datasets. As a result, the proposed algorithms demonstrate superior performance when solving feature selection problems.
    Keywords: Feature selection, Optimization, Correlation, Accuracy